fix(cypher,store): prevent crashes from buffer overflow, OOM, and NULL stmts#206
fix(cypher,store): prevent crashes from buffer overflow, OOM, and NULL stmts#206map588 wants to merge 1 commit intoDeusData:mainfrom
Conversation
map588
commented
Apr 5, 2026
- cypher: Add bounds check in lex_string_literal to prevent stack buffer overflow on string literals >4096 bytes
- cypher: Add malloc/calloc NULL checks in parse_props, parse_rel_types, parse_in_condition, and parse_case_expr to prevent OOM crashes
- store: Add sqlite3_prepare_v2 return code checks at 3 sites in cbm_store_schema_info and collect_pkg_names to prevent NULL stmt dereference on DB corruption
There was a problem hiding this comment.
Pull request overview
This PR hardens the Cypher lexer/parser and the SQLite-backed store layer against crash scenarios triggered by unusually large inputs, out-of-memory conditions, and SQLite statement preparation failures (e.g., due to DB corruption).
Changes:
- Cypher lexer: add a bounds check in
lex_string_literaland a regression test for oversized string literals. - Cypher parser: add initial
malloc/callocNULL checks in several parse helpers to reduce OOM crash risk. - Store: add
sqlite3_prepare_v2return-code/NULL-stmt checks at three call sites to avoid NULLsqlite3_stmt*dereferences.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/cypher/cypher.c |
Adds string-literal truncation logic and some initial allocation NULL checks in parsing helpers. |
src/store/store.c |
Adds SQLite prepare checks to avoid dereferencing NULL statements during schema/package collection. |
tests/test_cypher.c |
Adds a regression test to ensure oversized string literals don’t overflow lexer buffers. |
Comments suppressed due to low confidence (2)
src/store/store.c:2564
arris allocated without a NULL check. On OOM this will be NULL and thearr[n]...writes in the row loop will crash;safe_realloccan also return NULL later in the loop. Since this PR is explicitly hardening OOM cases, consider adding allocation checks and returning an error (finalizingstmtfirst) if allocation fails.
int cap = ST_INIT_CAP_8;
int n = 0;
cbm_label_count_t *arr = malloc(cap * sizeof(cbm_label_count_t));
while (sqlite3_step(stmt) == SQLITE_ROW) {
if (n >= cap) {
src/store/store.c:2591
arris allocated without a NULL check. On OOM this will be NULL and thearr[n]...writes in the row loop will crash;safe_realloccan also return NULL later in the loop. Please add allocation-result checks and handle failure (including finalizingstmt) to fully prevent OOM crashes.
int cap = ST_INIT_CAP_8;
int n = 0;
cbm_type_count_t *arr = malloc(cap * sizeof(cbm_type_count_t));
while (sqlite3_step(stmt) == SQLITE_ROW) {
if (n >= cap) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/cypher/cypher.c
Outdated
| int blen = 0; | ||
| const int max_blen = CBM_SZ_4K - 1; | ||
| while (*pos < len && input[*pos] != quote) { | ||
| if (blen >= max_blen) { (*pos)++; continue; } |
There was a problem hiding this comment.
The truncation path (if (blen >= max_blen) { (*pos)++; continue; }) skips escape-handling, so once the buffer is full a sequence like \" can be mis-tokenized: the backslash is skipped but the following quote is then seen as the closing delimiter, ending the string early. Consider still consuming escape sequences while truncating (advance past both '\' and the escaped char) so the lexer always finds the real closing quote.
| if (blen >= max_blen) { (*pos)++; continue; } | |
| if (blen >= max_blen) { | |
| if (input[*pos] == '\\' && *pos + SKIP_ONE < len) { | |
| *pos += SKIP_ONE; | |
| } | |
| (*pos)++; | |
| continue; | |
| } |
| if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) { | ||
| return CBM_NOT_FOUND; | ||
| } |
There was a problem hiding this comment.
If the edge-types sqlite3_prepare_v2 fails, this returns immediately without freeing any node-label data already stored in out from the previous block. Some callers only call cbm_store_schema_free on CBM_STORE_OK (e.g., UI layout3d), so this can leak memory. Consider using a single cleanup path that frees any partially-filled out before returning an error.
| int cap = CYP_INIT_CAP4; | ||
| int n = 0; | ||
| cbm_prop_filter_t *arr = malloc(cap * sizeof(cbm_prop_filter_t)); | ||
| if (!arr) { | ||
| return CBM_NOT_FOUND; | ||
| } |
There was a problem hiding this comment.
This adds an initial malloc NULL-check, but the function still calls safe_realloc and heap_strdup later without checking for NULL. Since safe_realloc returns NULL (and frees the old pointer), an OOM during growth or key/value duplication can still lead to NULL dereferences (e.g., later strcmp on a NULL props[i].value). Consider propagating an error if any allocation/duplication fails and freeing partially-built arr.
| int cap = CYP_INIT_CAP4; | ||
| int n = 0; | ||
| const char **types = malloc(cap * sizeof(const char *)); | ||
| if (!types) { | ||
| return CBM_NOT_FOUND; | ||
| } |
There was a problem hiding this comment.
This adds an initial malloc NULL-check, but types can still become NULL if a later safe_realloc fails (it returns NULL and frees the old buffer), and heap_strdup can return NULL as well. To fully prevent OOM crashes, please check realloc/duplication results, free any already-duplicated strings on failure, and return an error.
| int vcap = CYP_INIT_CAP8; | ||
| int vn = 0; | ||
| const char **vals = malloc(vcap * sizeof(const char *)); | ||
| if (!vals) { | ||
| free((void *)c->variable); | ||
| free((void *)c->property); | ||
| free((void *)c->op); | ||
| return NULL; |
There was a problem hiding this comment.
Good to check the initial malloc, but the list-growth path later uses safe_realloc without checking for NULL (it frees the old pointer on failure), and each element is populated via heap_strdup which may return NULL. Either can still produce a NULL dereference later when evaluating IN conditions (e.g., strcmp(actual, c->in_values[i])). Please add checks and clean up partial allocations on failure.
| cbm_case_expr_t *kase = calloc(CBM_ALLOC_ONE, sizeof(cbm_case_expr_t)); | ||
| if (!kase) { | ||
| return NULL; | ||
| } | ||
| int bcap = CYP_INIT_CAP4; | ||
| kase->branches = malloc(bcap * sizeof(cbm_case_branch_t)); | ||
| if (!kase->branches) { | ||
| free(kase); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
This checks calloc/malloc for the initial CASE expression allocations, but the branch-growth path later uses safe_realloc without checking for NULL (and safe_realloc frees the old buffer on failure). To avoid NULL dereferences/leaks on OOM, please check the realloc result and free kase/branches before returning an error.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (n >= cap) { | ||
| cap *= PAIR_LEN; | ||
| arr = safe_realloc(arr, cap * sizeof(cbm_prop_filter_t)); | ||
| if (!arr) { | ||
| return CBM_NOT_FOUND; | ||
| } |
There was a problem hiding this comment.
safe_realloc frees the original buffer on failure (see platform.h), so if this growth realloc fails you return with arr already freed but any previously heap_strdup'd keys/values are leaked (and you can't clean them up anymore). Use a non-destructive realloc pattern (tmp=realloc; on failure, free arr[i].key/value for i<n, then free(arr)) so OOM handling is leak-free.
| if (n >= cap) { | ||
| cap *= PAIR_LEN; | ||
| types = safe_realloc(types, cap * sizeof(const char *)); | ||
| if (!types) { | ||
| return CBM_NOT_FOUND; | ||
| } |
There was a problem hiding this comment.
Same safe_realloc issue as in parse_props: on realloc failure the types array is freed, but already-duplicated strings in types[0..n-1] leak. Prefer realloc into a temporary pointer so you can free the existing types[i] strings and the array on failure.
src/cypher/cypher.c
Outdated
| vcap *= PAIR_LEN; | ||
| vals = safe_realloc(vals, vcap * sizeof(const char *)); | ||
| if (!vals) { | ||
| free((void *)c->variable); | ||
| free((void *)c->property); | ||
| free((void *)c->op); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
On realloc failure, safe_realloc frees the existing vals array, so returning here leaks all heap_strdup'd values already stored in vals[0..vn-1]. Use realloc into a temp pointer (or a realloc helper that does not free on failure) so you can free the accumulated vals[i] strings + vals before returning.
| vcap *= PAIR_LEN; | |
| vals = safe_realloc(vals, vcap * sizeof(const char *)); | |
| if (!vals) { | |
| free((void *)c->variable); | |
| free((void *)c->property); | |
| free((void *)c->op); | |
| return NULL; | |
| } | |
| const char **new_vals; | |
| int i; | |
| vcap *= PAIR_LEN; | |
| new_vals = realloc((void *)vals, vcap * sizeof(const char *)); | |
| if (!new_vals) { | |
| for (i = 0; i < vn; i++) { | |
| free((void *)vals[i]); | |
| } | |
| free((void *)vals); | |
| free((void *)c->variable); | |
| free((void *)c->property); | |
| free((void *)c->op); | |
| return NULL; | |
| } | |
| vals = new_vals; |
| if (kase->branch_count >= bcap) { | ||
| bcap *= PAIR_LEN; | ||
| kase->branches = safe_realloc(kase->branches, bcap * sizeof(cbm_case_branch_t)); | ||
| if (!kase->branches) { | ||
| expr_free(when); | ||
| free(kase); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
On branch growth, safe_realloc frees the existing kase->branches array on failure; returning here then leaks all previously parsed when_expr trees and then_val strings already stored in the old branches (since you no longer have the array to iterate). Switch to a non-destructive realloc pattern and, on failure, call free_case_expr(kase) (or equivalent partial cleanup) before returning.
| if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) { | ||
| return CBM_NOT_FOUND; | ||
| } |
There was a problem hiding this comment.
If sqlite3_prepare_v2 fails it can still leave a non-NULL statement handle in some cases (e.g., partial compile). Returning without finalizing risks leaking stmt. Consider capturing rc, and if stmt is non-NULL always sqlite3_finalize(stmt) before returning (and optionally record the SQLite error).
| if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) { | ||
| cbm_store_schema_free(out); | ||
| return CBM_NOT_FOUND; | ||
| } |
There was a problem hiding this comment.
Same as above: on prepare failure, ensure any non-NULL stmt is finalized before returning. Right now this path can leak stmt (and you already free out, which is good).
| int max_pkgs) { | ||
| sqlite3_stmt *stmt = NULL; | ||
| sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL); | ||
| if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) { |
There was a problem hiding this comment.
On prepare failure, consider finalizing stmt if it is non-NULL before returning to avoid leaking a partially prepared statement. Also consider recording the underlying SQLite error so callers have some diagnostics when DB corruption triggers this path.
| if (sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL) != SQLITE_OK || !stmt) { | |
| int rc = sqlite3_prepare_v2(s->db, sql, CBM_NOT_FOUND, &stmt, NULL); | |
| if (rc != SQLITE_OK || !stmt) { | |
| if (stmt != NULL) { | |
| sqlite3_finalize(stmt); | |
| } |
…L stmts - cypher: Add bounds check in lex_string_literal to prevent stack buffer overflow on string literals >4096 bytes. Escape sequences are always parsed correctly even past the truncation boundary. - cypher: Add malloc/calloc NULL checks in parse_props, parse_rel_types, parse_in_condition, and parse_case_expr. Growth paths use non-destructive realloc (temp pointer) so accumulated elements can be freed on OOM instead of leaking through safe_realloc's free-on-failure semantics. - store: Add sqlite3_prepare_v2 return code checks at 3 sites in cbm_store_schema_info and collect_pkg_names. Partially prepared statements are finalized before returning. Schema function cleans up partially populated output on failure. collect_pkg_names returns CBM_NOT_FOUND (not 0) to distinguish errors from empty results.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| arr = tmp; | ||
| cap = new_cap; | ||
| } | ||
| arr[n].key = heap_strdup(key->text); | ||
| arr[n].value = heap_strdup(val->text); |
There was a problem hiding this comment.
parse_props assigns arr[n].key/value from heap_strdup without checking for NULL. If heap_strdup fails, later code like check_inline_props() will call strcmp(props[i].value, ...) and can segfault. After heap_strdup, validate the returned pointers; on failure, free any previously allocated entries (and arr itself) and return an error.
| types = (const char **)tmp; | ||
| cap = new_cap; | ||
| } | ||
| types[n++] = heap_strdup(t->text); | ||
| } |
There was a problem hiding this comment.
parse_rel_types stores relationship type strings via heap_strdup but never checks for NULL. If an allocation fails, rel->types will contain NULL and later strcmp/SQLite binding paths can dereference it. After heap_strdup, check for NULL and, if needed, free any previously duplicated types and return an error.
| } | ||
| vals = (const char **)tmp; | ||
| vcap = new_vcap; | ||
| } | ||
| vals[vn++] = heap_strdup(advance(p)->text); | ||
| } else { |
There was a problem hiding this comment.
parse_in_list appends list items via heap_strdup without checking for NULL. eval_condition later does strcmp(actual, c->in_values[i]) and will crash if any entry is NULL. Check heap_strdup’s return; on failure, free vals (including already-duplicated strings) and c’s allocated fields, then return NULL with an error set.
| if (!tmp) { | ||
| expr_free(when); | ||
| for (int i = 0; i < kase->branch_count; i++) { | ||
| expr_free(kase->branches[i].when_expr); |
There was a problem hiding this comment.
On realloc failure, the cleanup loop frees only when_expr for existing branches but not branches[i].then_val. That leaks the already-duplicated THEN values (and can be substantial for large queries) right in the OOM path this change is trying to harden. Free then_val for each existing branch in the failure cleanup before freeing branches/kase.
| expr_free(kase->branches[i].when_expr); | |
| expr_free(kase->branches[i].when_expr); | |
| free((void *)kase->branches[i].then_val); |
| if (stmt) { | ||
| sqlite3_finalize(stmt); | ||
| } | ||
| return CBM_NOT_FOUND; |
There was a problem hiding this comment.
collect_pkg_names returns CBM_NOT_FOUND on sqlite3_prepare_v2 failure, which is the same numeric value as CBM_STORE_ERR (-1) but semantically inconsistent with other store helpers that return CBM_STORE_ERR and call store_set_error_sqlite(). Consider returning CBM_STORE_ERR (or 0 if you intend “no pkgs”) and setting the store error so downstream callers can diagnose DB corruption.
| return CBM_NOT_FOUND; | |
| store_set_error_sqlite(s, "sqlite3_prepare_v2 failed in collect_pkg_names"); | |
| return CBM_STORE_ERR; |